Skip to content

[ALMIOPEN-1696] Re-enabling certain MIOpen tests with ASan enabled.#7904

Draft
NolanHannaAMD wants to merge 1 commit into
developfrom
users/nhanna/asan-re-enable-safe-tests
Draft

[ALMIOPEN-1696] Re-enabling certain MIOpen tests with ASan enabled.#7904
NolanHannaAMD wants to merge 1 commit into
developfrom
users/nhanna/asan-re-enable-safe-tests

Conversation

@NolanHannaAMD
Copy link
Copy Markdown
Contributor

Motivation

Trying to reduce the number of disabled tests under ASan to increase test coverage while still attempting to avoid test breakages on CI. These were disabled in a early sweep when many issues were being encountered. Since then, several fixes have gone in and the goal is to reduce the list to a more manageable size. The remaining disabled tests are split into 2 groups, one of which may be able to be re-enabled with further testing on other architectures/environments and the other that should remain disabled until the underlying issues are addressed.

Putting this PR up as a draft as this work was begun, but a full ASan gtest run is still necessary. All enabled tests passed individual testing.

Note: There is still ongoing work with the CK hangs that will hopefully address the remainder of the hangs. When that work is complete, more tests should ideally be able to come off this list.

Technical Details

There are two different categories that were re-enabled and two that remain disabled. A full run of the gtests with ASan enabled should be done before considering.

Definitely safe - reenabled

The following files no longer exist and therefore could not cause issues when being re-enabled:

Potentially safe - reenabled

The following are tests that all passed individual testing, but still need to be tested within a full gtest run before being merged:

  • bad_fusion_plan.cpp
  • cba_find2_infer.cpp
  • cba_infer.cpp
  • conv_activ_infer.cpp
  • conv_ai_3d_kernel_tuning_utils.cpp
  • find_2_conv.cpp
  • find_db.cpp
  • find_mode_trust_verify.cpp
  • fused_conv_bias_res_add_activ.cpp
  • group_conv_deterministic_split_k.cpp
  • group_conv2d_fwd.cpp
  • group_conv2d_bwd.cpp
  • group_conv2d_wrw.cpp
  • kernel_tuning_net.cpp
  • miopendriver_gemm.cpp
  • miopendriver_regression_big_tensor.cpp (also has a duplicate entry that can be dropped)
  • miopendriver_regression_half.cpp
  • perf_config_HipImplicitGemm3DGroupFwdXdlops.cpp
  • unit_conv_solver_ConvAsmImplicitGemmGTCDynamicBwdXdlops.cpp
  • unit_conv_solver_ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC.cpp
  • unit_conv_solver_ConvAsmImplicitGemmGTCDynamicFwdXdlops.cpp
  • unit_conv_solver_ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC.cpp
  • unit_conv_solver_ConvAsmImplicitGemmGTCDynamicWrwXdlops.cpp
  • unit_conv_solver_ConvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC.cpp
  • unit_conv_solver_ConvHipImplicitGemm3DGroupBwdXdlops.cpp
  • unit_conv_solver_ConvHipImplicitGemmGroupFwdXdlops.cpp
  • unit_conv_solver_ConvHipImplicitGemmGroupWrwXdlops.cpp
  • unit_implicitgemm_ck_util.cpp

Untested - remain on disabled list

The following are tests that were skipped when individual tests were executed due to architecture or disabled frameworks):

  • conv_ck_igemm_fwd_v6r1_dlops_nchw.cpp — solver not supported on gfx942
  • conv_hip_igemm_xdlops.cpptest_drive<> framework disabled
  • conv_igemm_mlir_xdlops_bwd_wrw.cpptest_drive<> framework disabled
  • conv_igemm_mlir_xdlops_fwd.cpptest_drive<> framework disabled
  • smoke_solver_ConvCkIgemmFwdV6r1DlopsNchw.cpptest_drive<> framework disabled
  • smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_fp32_fp16.cpptest_drive<> framework disabled
  • smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpptest_drive<> framework disabled
  • unit_conv_solver_ConvHipImplicitGemmBwdDataV1R1Xdlops.cpp — gfx942 not supported
  • unit_conv_solver_ConvHipImplicitGemmBwdDataV4R1Xdlops.cpp — gfx942 not supported
  • unit_conv_solver_ConvHipImplicitGemmForwardV4R4Xdlops.cpp — gfx942 not supported
  • unit_conv_solver_ConvHipImplicitGemmForwardV4R4Xdlops_Padded_Gemm.cpp — gfx942 not supported
  • unit_conv_solver_ConvHipImplicitGemmForwardV4R5Xdlops.cpp — gfx942 not supported
  • unit_conv_solver_ConvHipImplicitGemmWrwV4R4Xdlops.cpp — gfx942 not supported
  • unit_conv_solver_ConvHipImplicitGemmWrwV4R4Xdlops_Padded_Gemm.cpp — gfx942 not supported

Unsafe

  • group_conv3d_fwd.cpp — CK 3D conv kernel hang
  • group_conv3d_bwd.cpp — CK 3D conv kernel hang
  • group_conv3d_wrw.cpp — CK 3D conv kernel hang
  • miopendriver_conv_immed.cpp — BFP16 subprocess hangs >60min
  • miopendriver_conv2d_trans.cpp — BFP16 trans conv extremely slow >27min
  • miopendriver_regression_half_gfx9.cpp — 3D FP16 subprocess hangs >25min
  • unit_conv_solver_ConvHipImplicitGemm3DGroupFwdXdlops.cpp — CK 3D fwd hang
  • unit_conv_solver_ConvHipImplicitGemm3DGroupWrwXdlops.cpp — CK 3D wrw hang
  • unit_conv_solver_ConvCkGroupedConvFwd.cpp — ASAN GPU crash in CK GridwiseGroupedConv2DFwd
  • unit_conv_solver_ConvHipImplicitGemmGroupBwdXdlops.cpp — ASAN GPU crash on TF32 dilation>

Test Plan

With an ASan enabled build, run the full MIOpen gtest suite that is run on CI to verify no problems are encountered.

Test Result

TBD: If no issues are encountered, this should be safe to merge, if any hangs/etc are encountered, this should be moved to the top disabled list (with a specific comment mentioning that the issue only manifests on a full run).

Risk Assessment

Currently, I would rate this as MEDIUM as the full gtest suite needs to be re-run together under ASan. Issues could affect the ASan CI and re-introdude failures that were experienced before, however this will not impact non-ASan builds at all. Once the full test run is complete, if no issues are encountered, I would downgrade this to low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant